real system
Deterministic World Models for Verification of Closed-loop Vision-based Systems
Geng, Yuang, Zhou, Zhuoyang, Zhang, Zhongzheng, Pan, Siyuan, Tran, Hoang-Dung, Ruchkin, Ivan
Verifying closed-loop vision-based control systems remains a fundamental challenge due to the high dimensionality of images and the difficulty of modeling visual environments. While generative models are increasingly used as camera surrogates in verification, their reliance on stochastic latent variables introduces unnecessary overapproximation error. To address this bottleneck, we propose a Deterministic World Model (DWM) that maps system states directly to generative images, effectively eliminating uninterpretable latent variables to ensure precise input bounds. The DWM is trained with a dual-objective loss function that combines pixel-level reconstruction accuracy with a control difference loss to maintain behavioral consistency with the real system. We integrate DWM into a verification pipeline utilizing Star-based reachabil-ity analysis (StarV) and employ conformal prediction to derive rigorous statistical bounds on the trajectory deviation between the world model and the actual vision-based system. Experiments on standard benchmarks show that our approach yields significantly tighter reachable sets and better verification performance than a latent-variable baseline.
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations
Zhang, Haoting, Chen, Haoxian, Zhan, Donglin, Zhao, Hanyang, Lam, Henry, Tang, Wenpin, Yao, David, Zheng, Zeyu
The field of simulation optimization (SO) encompasses various methods developed to optimize complex, expensive-to-sample stochastic systems. Established methods include, but are not limited to, ranking-and-selection for finite alternatives and surrogate-based methods for continuous domains, with broad applications in engineering and operations management. The recent advent of large language models (LLMs) offers a new paradigm for exploiting system structure and automating the strategic selection and composition of these established SO methods into a tailored optimization procedure. This work introduces SOCRATES (Simulation Optimization with Correlated Replicas and Adaptive Trajectory Evaluations), a novel two-stage procedure that leverages LLMs to automate the design of tailored SO algorithms. The first stage constructs an ensemble of digital replicas of the real system. An LLM is employed to implement causal discovery from a textual description of the system, generating a structural `skeleton' that guides the sample-efficient learning of the replicas. In the second stage, this replica ensemble is used as an inexpensive testbed to evaluate a set of baseline SO algorithms. An LLM then acts as a meta-optimizer, analyzing the performance trajectories of these algorithms to iteratively revise and compose a final, hybrid optimization schedule. This schedule is designed to be adaptive, with the ability to be updated during the final execution on the real system when the optimization performance deviates from expectations. By integrating LLM-driven reasoning with LLM-assisted trajectory-aware meta-optimization, SOCRATES creates an effective and sample-efficient solution for complex SO optimization problems.
- North America > United States > Hawaii (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
As, Yarden, Qu, Chengrui, Unger, Benjamin, Kang, Dongho, van der Hart, Max, Shi, Laixi, Coros, Stelian, Wierman, Adam, Krause, Andreas
Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
- Europe > Switzerland > Zürich > Zürich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia (0.04)
- (4 more...)
- Health & Medicine (1.00)
- Leisure & Entertainment > Games (0.54)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.82)
VelLMes: A high-interaction AI-based deception framework
Sladić, Muris, Valeros, Veronica, Catania, Carlos, Garcia, Sebastian
There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding correctly to most of the issued commands.
- Europe > Czechia > Prague (0.05)
- Asia > Singapore (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- South America > Argentina > Cuyo > Mendoza Province > Mendoza (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.34)
The Crucial Role of Problem Formulation in Real-World Reinforcement Learning
Schäfer, Georg, Krau, Tatjana, Rehrl, Jakob, Huber, Stefan, Hirlaender, Simon
Reinforcement Learning (RL) offers promising solutions for control tasks in industrial cyber-physical systems (ICPSs), yet its real-world adoption remains limited. This paper demonstrates how seemingly small but well-designed modifications to the RL problem formulation can substantially improve performance, stability, and sample efficiency. We identify and investigate key elements of RL problem formulation and show that these enhance both learning speed and final policy quality. Our experiments use a one-degree-of-freedom (1-DoF) helicopter testbed, the Quanser Aero~2, which features non-linear dynamics representative of many industrial settings. In simulation, the proposed problem design principles yield more reliable and efficient training, and we further validate these results by training the agent directly on physical hardware. The encouraging real-world outcomes highlight the potential of RL for ICPS, especially when careful attention is paid to the design principles of problem formulation. Overall, our study underscores the crucial role of thoughtful problem formulation in bridging the gap between RL research and the demands of real-world industrial systems.
- Europe > Austria > Salzburg > Salzburg (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Safe Continual Domain Adaptation after Sim2Real Transfer of Reinforcement Learning Policies in Robotics
Josifovski, Josip, Gu, Shangding, Malmir, Mohammadhossein, Huang, Haoliang, Auddy, Sayantan, Navarro-Guerrero, Nicolás, Spanos, Costas, Knoll, Alois
Domain randomization has emerged as a fundamental technique in reinforcement learning (RL) to facilitate the transfer of policies from simulation to real-world robotic applications. Many existing domain randomization approaches have been proposed to improve robustness and sim2real transfer. These approaches rely on wide randomization ranges to compensate for the unknown actual system parameters, leading to robust but inefficient real-world policies. In addition, the policies pretrained in the domain-randomized simulation are fixed after deployment due to the inherent instability of the optimization processes based on RL and the necessity of sampling exploitative but potentially unsafe actions on the real system. This limits the adaptability of the deployed policy to the inevitably changing system parameters or environment dynamics over time. We leverage safe RL and continual learning under domain-randomized simulation to address these limitations and enable safe deployment-time policy adaptation in real-world robot control. The experiments show that our method enables the policy to adapt and fit to the current domain distribution and environment dynamics of the real system while minimizing safety risks and avoiding issues like catastrophic forgetting of the general policy found in randomized simulation during the pretraining phase. Videos and supplementary material are available at https://safe-cda.github.io/.
- Europe > Germany (0.46)
- Oceania > Australia (0.46)
- North America > United States > California (0.28)
- (8 more...)
Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics
Sun, Li, Zhang, Ziheng, Wang, Zixi, Wang, Yujie, Wan, Qiqi, Li, Hao, Peng, Hao, Yu, Philip S.
Dynamic interacting system modeling is important for understanding and simulating real world systems. The system is typically described as a graph, where multiple objects dynamically interact with each other and evolve over time. In recent years, graph Ordinary Differential Equations (ODE) receive increasing research attentions. While achieving encouraging results, existing solutions prioritize the traditional Euclidean space, and neglect the intrinsic geometry of the system and physics laws, e.g., the principle of entropy increasing. The limitations above motivate us to rethink the system dynamics from a fresh perspective of Riemannian geometry, and pose a more realistic problem of physics-informed dynamic system modeling, considering the underlying geometry and physics law for the first time. In this paper, we present a novel physics-informed Riemannian graph ODE for a wide range of entropy-increasing dynamic systems (termed as Pioneer). In particular, we formulate a differential system on the Riemannian manifold, where a manifold-valued graph ODE is governed by the proposed constrained Ricci flow, and a manifold preserving Gyro-transform aware of system geometry. Theoretically, we report the provable entropy non-decreasing of our formulation, obeying the physics laws. Empirical results show the superiority of Pioneer on real datasets.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Greenland (0.04)
All AI Models are Wrong, but Some are Optimal
Anand, Akhil S, Sawant, Shambhuraj, Reinhardt, Dirk, Gros, Sebastien
AI models that predict the future behavior of a system (a.k.a. predictive AI models) are central to intelligent decision-making. However, decision-making using predictive AI models often results in suboptimal performance. This is primarily because AI models are typically constructed to best fit the data, and hence to predict the most likely future rather than to enable high-performance decision-making. The hope that such prediction enables high-performance decisions is neither guaranteed in theory nor established in practice. In fact, there is increasing empirical evidence that predictive models must be tailored to decision-making objectives for performance. In this paper, we establish formal (necessary and sufficient) conditions that a predictive model (AI-based or not) must satisfy for a decision-making policy established using that model to be optimal. We then discuss their implications for building predictive AI models for sequential decision-making.
- Europe > Norway (0.28)
- North America > Canada (0.28)
- Energy > Energy Storage (0.46)
- Energy > Oil & Gas (0.30)
Velocity-History-Based Soft Actor-Critic Tackling IROS'24 Competition "AI Olympics with RealAIGym"
Faust, Tim Lukas, Maraqten, Habib, Aghadavoodi, Erfan, Belousov, Boris, Peters, Jan
The ``AI Olympics with RealAIGym'' competition challenges participants to stabilize chaotic underactuated dynamical systems with advanced control algorithms. In this paper, we present a novel solution submitted to IROS'24 competition, which builds upon Soft Actor-Critic (SAC), a popular model-free entropy-regularized Reinforcement Learning (RL) algorithm. We add a `context' vector to the state, which encodes the immediate history via a Convolutional Neural Network (CNN) to counteract the unmodeled effects on the real system. Our method achieves high performance scores and competitive robustness scores on both tracks of the competition: Pendubot and Acrobot.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
- North America > United States > New York (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
An Efficient Multi-Robot Arm Coordination Strategy for Pick-and-Place Tasks using Reinforcement Learning
Jermann, Tizian, Kolvenbach, Hendrik, Estay, Fidel Esquivel, Kramer, Koen, Hutter, Marco
LASTIC pollution in rivers has become a pressing global issue, with 11 million tons of plastic waste entering the ocean annually, 80% of which is caused by 1,000 major polluting rivers [1]. To address this problem, it is desired to develop a solution capable of removing plastic and other waste objects without interfering with the existing flora and fauna essential to river ecosystems [2] . Our Autonomous River Cleanup (ARC) project, initiated in 2019, leverages robotics and automation to remove plastic waste from rivers. In order to increase the capacity at which this can be done, we enhance the existing single arm sorting station [3] with additional robot arms. For multiple robot agents to efficiently sort waste on a conveyor belt, we develop and evaluate novel strategy algorithms using reinforcement learning that assign pick-and-place (PnP) tasks to the respective robot agents (Figure 1). Given a set of objects on the moving conveyor belt, the robot agents are tasked with removing waste objects, whilst bio-matter is ignored and collected at the end of the belt. The challenge is to allocate each robot optimally with PnP operations for objects within its reachable workspace.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)